基于人工智能和机器学习算法的数据驱动的预测模型的解释性技术使我们能够更好地了解此类系统的运行,并有助于使它们负责。新的透明度方法以惊人的速度开发,使我们能够在这些黑匣子内窥视并解释他们的决策。这些技术中的许多被引入了整体工具,给人以有限的可自定义性的一定程度和端到端算法的印象。然而,这种方法通常由多个可互换的模块组成,这些模块需要调整到手头的问题以产生有意义的解释。本文介绍了动手培训材料的集合 - 幻灯片,视频录制和jupyter笔记本 - 通过构建和评估定制的模块化替代解释器的过程为表格数据提供指导。这些资源涵盖了该技术的三个核心构建基础:可解释的表示组成,数据采样和解释生成。
translated by 谷歌翻译
预测系统,特别是机器学习算法,可以对我们的日常生活做出重要的,有时甚至具有法律约束力的决定。但是,在大多数情况下,这些系统和决策既没有受到监管也不经过认证。鉴于这些算法可能造成的潜在伤害,因此公平,问责制和透明度(FAT)等质量至关重要。为了确保高质量,公平,透明和可靠的预测系统,我们开发了一个名为Fat Forensics的开源Python软件包。它可以检查预测算法的重要公平,问责制和透明度方面,以自动并客观地将其报告给此类系统的工程师和用户。我们的工具箱可以评估预测管道的所有元素:数据(及其功能),模型和预测。根据BSD 3范围的开源许可发布,Fat Forensics供个人和商业用法开放。
translated by 谷歌翻译
最近,使用模型无法轻易解释,最常见的神经网络的模型最近解决了计算机视觉中的许多问题。替代解释器是一种流行的事后解释性方法,可以进一步了解模型如何到达特定预测。通过训练一个简单,更容易解释的模型,以局部近似于非解剖系统的决策边界,我们可以估计输入特征在预测上的相对重要性。专注于图像,替代解释器,例如石灰,通过在可解释的域中采样来生成查询图像周围的本地邻域。但是,这些可解释的域传统上仅来自查询图像的固有特征,而不是考虑到该数据的流形,该数据的多种模型已在训练中暴露在训练中(或更普遍地,实际图像的多种形式) 。这导致对潜在低概率图像训练的次优替代物。我们通过对齐本地社区来解决此限制,即使无法访问此分配,代理人也接受了原始培训数据分配的培训。我们提出了两种这样做的方法,即(1)改变对局部邻域进行采样的方法,以及(2)使用感知指标传达自然图像分布的某些特性。
translated by 谷歌翻译
在本文中,我们详细阐述了基于旋转的迭代高斯rbig的扩展,这使图像高斯化成为可能。尽管RBIG已成功应用于许多任务,但它仅限于中等维度数据(按千维数据)。在图像中,其应用程序仅限于小图像贴片或孤立的像素,因为RBIG中的旋转基于主或独立的组件分析,并且这些转换很难学习和扩展。在这里,我们提出\ emph {卷积rbig}:通过强加rbig中的旋转是卷积来减轻此问题的扩展。我们建议通过优化使用转置卷积操作的输入和转换转换的近似反向来学习卷积旋转(即正交卷积)。此外,我们建议在学习这些正规卷积方面不同。例如,激活中施加稀疏性会导致一种转换,该转换将卷积独立的组件分析扩展到多层体系结构。我们还强调了如何从\ emph {卷积rbig}获得数据的统计属性(例如多元互信息)。我们通过简单的纹理合成示例来说明转换的行为,并通过可视化刺激来分析其属性,从而最大程度地提高某些特征和层中的响应。
translated by 谷歌翻译
In the past years, deep learning has seen an increase of usage in the domain of histopathological applications. However, while these approaches have shown great potential, in high-risk environments deep learning models need to be able to judge their own uncertainty and be able to reject inputs when there is a significant chance of misclassification. In this work, we conduct a rigorous evaluation of the most commonly used uncertainty and robustness methods for the classification of Whole-Slide-Images under domain shift using the H\&E stained Camelyon17 breast cancer dataset. Although it is known that histopathological data can be subject to strong domain shift and label noise, to our knowledge this is the first work that compares the most common methods for uncertainty estimation under these aspects. In our experiments, we compare Stochastic Variational Inference, Monte-Carlo Dropout, Deep Ensembles, Test-Time Data Augmentation as well as combinations thereof. We observe that ensembles of methods generally lead to higher accuracies and better calibration and that Test-Time Data Augmentation can be a promising alternative when choosing an appropriate set of augmentations. Across methods, a rejection of the most uncertain tiles leads to a significant increase in classification accuracy on both in-distribution as well as out-of-distribution data. Furthermore, we conduct experiments comparing these methods under varying conditions of label noise. We observe that the border regions of the Camelyon17 dataset are subject to label noise and evaluate the robustness of the included methods against different noise levels. Lastly, we publish our code framework to facilitate further research on uncertainty estimation on histopathological data.
translated by 谷歌翻译
Charisma is considered as one's ability to attract and potentially also influence others. Clearly, there can be considerable interest from an artificial intelligence's (AI) perspective to provide it with such skill. Beyond, a plethora of use cases opens up for computational measurement of human charisma, such as for tutoring humans in the acquisition of charisma, mediating human-to-human conversation, or identifying charismatic individuals in big social data. A number of models exist that base charisma on various dimensions, often following the idea that charisma is given if someone could and would help others. Examples include influence (could help) and affability (would help) in scientific studies or power (could help), presence, and warmth (both would help) as a popular concept. Modelling high levels in these dimensions for humanoid robots or virtual agents, seems accomplishable. Beyond, also automatic measurement appears quite feasible with the recent advances in the related fields of Affective Computing and Social Signal Processing. Here, we, thereforem present a blueprint for building machines that can appear charismatic, but also analyse the charisma of others. To this end, we first provide the psychological perspective including different models of charisma and behavioural cues of it. We then switch to conversational charisma in spoken language as an exemplary modality that is essential for human-human and human-computer conversations. The computational perspective then deals with the recognition and generation of charismatic behaviour by AI. This includes an overview of the state of play in the field and the aforementioned blueprint. We then name exemplary use cases of computational charismatic skills before switching to ethical aspects and concluding this overview and perspective on building charisma-enabled AI.
translated by 谷歌翻译
Deep learning-based 3D human pose estimation performs best when trained on large amounts of labeled data, making combined learning from many datasets an important research direction. One obstacle to this endeavor are the different skeleton formats provided by different datasets, i.e., they do not label the same set of anatomical landmarks. There is little prior research on how to best supervise one model with such discrepant labels. We show that simply using separate output heads for different skeletons results in inconsistent depth estimates and insufficient information sharing across skeletons. As a remedy, we propose a novel affine-combining autoencoder (ACAE) method to perform dimensionality reduction on the number of landmarks. The discovered latent 3D points capture the redundancy among skeletons, enabling enhanced information sharing when used for consistency regularization. Our approach scales to an extreme multi-dataset regime, where we use 28 3D human pose datasets to supervise one model, which outperforms prior work on a range of benchmarks, including the challenging 3D Poses in the Wild (3DPW) dataset. Our code and models are available for research purposes.
translated by 谷歌翻译
This article concerns Bayesian inference using deep linear networks with output dimension one. In the interpolating (zero noise) regime we show that with Gaussian weight priors and MSE negative log-likelihood loss both the predictive posterior and the Bayesian model evidence can be written in closed form in terms of a class of meromorphic special functions called Meijer-G functions. These results are non-asymptotic and hold for any training dataset, network depth, and hidden layer widths, giving exact solutions to Bayesian interpolation using a deep Gaussian process with a Euclidean covariance at each layer. Through novel asymptotic expansions of Meijer-G functions, a rich new picture of the role of depth emerges. Specifically, we find that the posteriors in deep linear networks with data-independent priors are the same as in shallow networks with evidence maximizing data-dependent priors. In this sense, deep linear networks make provably optimal predictions. We also prove that, starting from data-agnostic priors, Bayesian model evidence in wide networks is only maximized at infinite depth. This gives a principled reason to prefer deeper networks (at least in the linear case). Finally, our results show that with data-agnostic priors a novel notion of effective depth given by \[\#\text{hidden layers}\times\frac{\#\text{training data}}{\text{network width}}\] determines the Bayesian posterior in wide linear networks, giving rigorous new scaling laws for generalization error.
translated by 谷歌翻译
In this paper we study the smooth strongly convex minimization problem $\min_{x}\min_y f(x,y)$. The existing optimal first-order methods require $\mathcal{O}(\sqrt{\max\{\kappa_x,\kappa_y\}} \log 1/\epsilon)$ of computations of both $\nabla_x f(x,y)$ and $\nabla_y f(x,y)$, where $\kappa_x$ and $\kappa_y$ are condition numbers with respect to variable blocks $x$ and $y$. We propose a new algorithm that only requires $\mathcal{O}(\sqrt{\kappa_x} \log 1/\epsilon)$ of computations of $\nabla_x f(x,y)$ and $\mathcal{O}(\sqrt{\kappa_y} \log 1/\epsilon)$ computations of $\nabla_y f(x,y)$. In some applications $\kappa_x \gg \kappa_y$, and computation of $\nabla_y f(x,y)$ is significantly cheaper than computation of $\nabla_x f(x,y)$. In this case, our algorithm substantially outperforms the existing state-of-the-art methods.
translated by 谷歌翻译
This paper presents a solution to the GenChal 2022 shared task dedicated to feedback comment generation for writing learning. In terms of this task given a text with an error and a span of the error, a system generates an explanatory note that helps the writer (language learner) to improve their writing skills. Our solution is based on fine-tuning the T5 model on the initial dataset augmented according to syntactical dependencies of the words located within indicated error span. The solution of our team "nigula" obtained second place according to manual evaluation by the organizers.
translated by 谷歌翻译